MoNoise: Modeling Noise Using a Modular Normalization System

نویسندگان

  • Rob van der Goot
  • Gertjan van Noord
چکیده

We propose MoNoise: a normalization model focused on generalizability and efficiency, it aims at being easily reusable and adaptable. Normalization is the task of translating texts from a noncanonical domain to a more canonical domain, in our case: from social media data to standard language. Our proposed model is based on a modular candidate generation in which each module is responsible for a different type of normalization action. The most important generation modules are a spelling correction system and a word embeddings module. Depending on the definition of the normalization task, a static lookup list can be crucial for performance. We train a random forest classifier to rank the candidates, which generalizes well to all different types of normalization actions. Most features for the ranking originate from the generation modules; besides these features, N-gram features prove to be an important source of information. We show that MoNoise beats the state-of-the-art on different normalization benchmarks for English and Dutch, which all define the task of normalization slightly different.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inverse modeling of gravity field data due to finite vertical cylinder using modular neural network and least-squares standard deviation method

In this paper, modular neural network (MNN) inversion has been applied for the parameters approximation of the gravity anomaly causative target. The trained neural network is used for estimating the amplitude coefficient and depths to the top and bottom of a finite vertical cylinder source. The results of the applied neural network method are compared with the results of the least-squares stand...

متن کامل

An Improved Modular Modeling for Analysis of Closed-Cycle Absorption Cooling Systems

A detailed modular modeling of an absorbent cooling system is presented in this paper. The model including the key components is described in terms of design parameters, inputs, control variables, and outputs. The model is used to simulate the operating conditions for estimating the behavior of individual components and system performance, and to conduct a sensitivity analysis based on the give...

متن کامل

Improving Data-based Wind Turbine Using Measured Data Foggy Method

The purpose of this paper is to improve the modeling of the data-driven wind turbine system that receives data from noise signals. Most of the data on industrial systems is noisely and data noise is inevitable and natural. The method and idea proposed in this paper, Data Fogging, significantly reduce the impact of noise on data-driven wind turbine system modeling, which is the basis of this met...

متن کامل

An Optimized Online Secondary Path Modeling Method for Single-Channel Feedback ANC Systems

This paper proposes a new method for online secondary path modeling in feedback active noise control (ANC) systems. In practical cases, the secondary path is usually time-varying. For these cases, online modeling of secondary path is required to ensure convergence of the system. In literature the secondary path estimation is usually performed offline, prior to online modeling, where in the prop...

متن کامل

Mapping of noise pollution by different interpolation methods in recovery section of Ghandi telecommunication Cables Company

  Background : Noise pollution and workers' noise exposure are common in industrial factories in Iran. In order to reduce this noise pollution, evaluation and investigation of noise emission are both necessary. In this study, different noise mapping methodsare used for determining the distribution of noise.   Materials and Methods : In the present study, for preparing a noise map i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1710.03476  شماره 

صفحات  -

تاریخ انتشار 2017